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Abstract —The Vandermonde decomposition of Toeplitz ma¬ 
trices, discovered by Caratheodory and Fejer in the 1910s 
and rediscovered by Pisarenko in the 1970s, forms the basis 
of modern subspace methods for ID frequency estimation. 
Many related numerical tools have also been developed for 
multidimensional (MD), especially 2D, frequency estimation; 
however, a fundamental question has remained unresolved as 
to whether an analog of the Vandermonde decomposition holds 
for multilevel Toeplitz matrices in the MD case. In this paper, an 
affirmative answer to this question and a constructive method 
for finding the decomposition are provided when the matrix 
rank is iower than the dimension of each Toeplitz block. A 
numerical method for searching for a decomposition is also 
proposed when the matrix rank is higher. The new results 
are applied to studying MD frequency estimation within the 
recent super-resolution framework. A precise formulation of the 
atomic Iq norm is derived using the Vandermonde decomposition. 
Practical algorithms for frequency estimation are proposed based 
on relaxation techniques. Extensive numerical simulations are 
provided to demonstrate the effectiveness of these algorithms 
compared to the existing atomic norm and subspace methods. 

Keywords: The Vandermonde decomposition, multilevel 
Toeplitz matrix, multidimensional frequency estimation, super¬ 
resolution, atomic norm. 

I. Introduction 

The Vandermonde decomposition is a classical result by 
Caratheodory and Fejer dating back to 1911 0. To be specific, 
suppose that T is an n x n positive semidefinite (PSD) Toeplitz 
matrix of rank r < n. The result states that T can be uniquely 
decomposed as 

T = APA^, (1) 

where P is an r x r positive definite diagonal matrix and 
A is an n X r Vandermonde matrix whose columns corre¬ 
spond to uniformly sampled complex sinusoids with different 
frequencies. The result became important in the area of data 
analysis and signal processing when it was rediscovered by 
Pisarenko and used for frequency retrieval from the data 
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covariance matrix 0 . From then on, the Vandermonde decom¬ 
position, also referred to as the Caratheodory-Fejer-Pisarenko 
decomposition, has formed the basis of a prominent subset 
of methods designated as subspace methods, e.g., multiple 
signal classification (MUSIC) and estimation of parameters 
by rotational invariant techniques (ESPRIT) (see the review in 

@)- 

The problem of estimating multidimensional (MD) frequen¬ 
cies arises in various applications including array processing, 
radar, sonar, astronomy and medical imaging. Inspired by the 
results in the ID case, several computational subspace methods 
have been proposed for 2D frequency estimation such as 2D 
MUSIC 0, 2D ESPRIT 1^, 1^, matrix enhancement and 
matrix pencil (MEMP) Q and the multidimensional folding 
(MDE) techniques ||9|-|[ll|. However, a fundamental question 
remains unresolved as to whether an analog of the Vander¬ 
monde decomposition result holds true in the 2D or more 
general in the MD case. Note that the data covariance matrix 
corresponding to MD frequency estimation is a multilevel 
Toeplitz (MET) matrix (see the definition in the next section). 
Consequently, the question can be phrased as follows; 

Given a PSD, rank-deficient MET matrix, does it 
always admit a Vandermonde-like decomposition 
parameterized by MD frequencies? In other words, 
can this matrix be always the covariance matrix of 
an MD sinusoidal signal? 


An answer to the above question has recently become 
important due to the super-resolution framework in GD which 
studies the recovery of fine details in a sparse (ID or MD) fre¬ 
quency spectrum from coarse scale time-domain samples. With 
compressive measurements super-resolution actually general¬ 
izes the compressed sensing problem in GD to the continuous 
(as opposed to discretized/gridded) frequency setting and is 
referred to as off-grid or continuous compressed sensing G3’ 
GH- The paper proposed a convex optimization method 
based on the atomic norm (or total variation norm, see GD’ 
p7| ) and proved that in the noiseless case the frequencies 
can be recovered with infinite precision provided that they 
are sufficiently separated. Unlike other methods, the atomic 
norm method is stable in the presence of noise and can deal 
with missing data |14|, G3-GD- It was also extended to the 
multi-snapshot case in array processing p3) , p2) . Moreover, 
a reweighted atomic norm method with enhanced sparsity and 
resolution was proposed in | [23) . 

The atomic norm is a continuous counterpart of the E 
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norm. A finite-dimensional formulation of it is required for 
numerical computations. In the ID case a semidefinite program 
(SDP) formulation was provided based on the Vandermonde 
decomposition of Toeplitz matrices p^ , p2) . However, in the 
MD case a similar result has not been available as an analog of 
the Vandermonde decomposition was unknown. Interestingly, 
an SDP formulation with unspecified parameters was derived 
in based on duality and the bounded real lemma for 
multivariate trigonometric polynomials p5j , which indeed is 
related to an MLT matrix whose dimension though is left 
unspecified^ A relaxed version of this formulation has also 
been applied in Il26), ^ to the 2D case. 

In this paper, we generalize the Caratheodory-Fejer’s result 
from the ID to the MD case and provide an affirmative answer 
to the question asked above in the case when the matrix rank 
is lower than the dimension of each Toeplitz block. The new 
matrix A in the resulting decomposition (see ([T]i), which is 
still called Vandermonde, is the Khatri-Rao product of several 
Vandermonde matrices. A constructive method is provided for 
finding the Vandermonde decomposition. When the matrix 
rank is higher a numerical approach is also proposed that 
is guaranteed to find a Vandermonde decomposition if some 
conditions are satisfied. 

To demonstrate the usefulness of the Vandermonde decom¬ 
position presented in this paper, we study the MD super¬ 
resolution problem with compressive measurements. A precise 
formulation of the atomic norm is derived based on the 
decomposition. Practical algorithms for solving the atomic 
norm minimization problem are proposed based on convex 
relaxation as well as on nonconvex relaxation and reweighted 
minimization. Frequency retrieval is finally accomplished us¬ 
ing the proposed Vandermonde decomposition algorithms. 
Numerical results are provided to demonstrate the advantage 
of the proposed solutions over the state-of-the-art. 

A. Connections to Prior Art 

Similarly to this paper that generalizes the Caratheodory- 
Fejer’s Vandermonde decomposition from the ID to the MD 
case, other related generalizations have also been attempted in 
the literature. In | |28) the uniqueness part of the Caratheodory- 
Fejer’s result was generalized to the MD case. Different from 
our result, focused on the identifiability of parameters 
of MD complex exponentials given discrete samples of their 
superposition. An analogous decomposition was presented 
in for state-covariance matrices (including the Toeplitz 
case) of stable linear filters driven by certain time-series. 
The author also studied its multivariable counterpart in pO) . 
Though the state-covariance matrices in m include block 
Toeplitz matrices, which further include the MLT ones, the 
decomposition derived in pO) is less structured than and does 
not imply the decomposition given in this paper. A similar 
decomposition of block Toeplitz matrices as in pO) was also 
introduced in m which, as we will see, is useful for deriving 
the result of this paper. A recent paper p^ attempted to 

* Strictly speaking, the SDP formulation of the atomic norm given in |24) is 
a relaxed version since it is based on the so-called sum-of-squares relaxation 
for nonnegative multivariate trigonometric polynomials 1251. 


generalize the Vandermonde decomposition to the 2D case 
and provided a result similar to Theorem [T] in the present 
paper; however, its proof is incomplete and some derivations 
are flawed. In fact, its proof is almost identical to that in 1311 
for block Toeplitz matrices (see Remark]^. 

Toeplitz matrices can be viewed as discrete counterparts 
of Toeplitz operators that together with Hankel ones form an 
important class in operator theory. In the ID case, Kronecker 
discovered in the nineteenth century a Vandermonde-like de¬ 
composition of Hankel matrices that holds in general with the 
exception of degenerate cases p^ , p4) . The Vandermonde 
decomposition by Caratheodory and Fejer can be viewed as 
a more precise result of Kronecker’s Theorem obtained by 
imposing the condition of PSDness that completely avoids the 
degenerate cases. A recent paper p5) studied the multivariable 
Hankel operators and showed that Kronecker’s Theorem in the 
MD case differs from its ID counterpart in several key aspects. 
In particular, an ML Hankel matrix often does not admit a 
Vandermonde-like decomposition. In contrast, we show in this 
paper that by imposing the PSDness all the MLT matrices 
admit a Vandermonde decomposition under an appropriate 
rank condition. In this context we note that it would be of 
interest to investigate the continuous counterpart of the result 
of this paper for multivariable Toeplitz operators. 

The Vandermonde decomposition of MLT matrices pro¬ 
vides the theoretical basis of several subspace methods for 
2D and higher-dimensional frequency estimation proposed in 
the 1990s, which typically estimate the frequencies from an 
estimate of the MLT covariance matrix (see, e.g., 0-f81). The 
super-resolution methods presented in this paper are inspired 
by p3) and also can be viewed as covariance-based methods 
similarly to the subspace methods (see also pO) , p2)). But the 
main difference is that in this paper the covariance estimates 
are obtained by optimization of sophisticated covariance-fitting 
criteria that fully exploit the MLT structure and utilize signal 
sparsity. Moreover, the used criteria work in the presence 
of missing data. Note that 2D frequency estimation has also 
been studied in 


J, |37|. The paper p2| analyzed the 
performance of the atomic norm method, which generalizes 
the result in HD from the ID to the 2D case. Both | [^ and 
1371 exploited the low-rankness of a certain 2-level Hankel 
matrix formed using the data samples. Since the methods in 
these papers actually can be applied to the case of general 
complex exponentials, their use for the frequency estimation 
of sinusoids appears to produce suboptimal results. Moreover, 
as mentioned above a Vandermonde-like decomposition may 
not exist for an ML Hankel matrix [35) and therefore the use of 
the decomposition for parameter retrieval can be problematic. 


B. Notations 

Notations used in this paper are as follows. K, C and 
Z denote the sets of real, complex and integer numbers 
respectively. T denotes the unit circle [0,1] by identifying the 
beginning and the ending points. Boldface letters are reserved 
for vectors and matrices. and denote the matrix transpose 
and the Hermitian transpose. H-Hj^ and 11-112 represent the 
and ^2 norms. For two vectors a and b, a < b is understood 
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elementwise. For two square matrices A and B, A > B 
means that A~ B is positive semidefinite. 

C. Organization of This Paper 

The rest of the paper is organized as follows. Section 
|n] introduces some preliminaries. Section presents the 
main contribution—the Vandermonde decomposition of MLT 
matrices—as well as the methods for hnding the decomposi¬ 
tion. In Section |IV] the obtained results are applied to studying 
the MD super-resolution problem using an atomic norm 
method. In Section |V] extensive numerical simulations are 
provided to validate the theoretical findings and demonstrate 
the performance of the proposed super-resolution approaches. 
Section [Vl| concludes this paper. 

II. Preliminaries 
A. Toeplitz and MLT Matrices 

Given a complex sequence u = [uk], k G Z, an n x n 
Toeplitz matrix r„ is defined as 



Uq 

Ul 

ttn—l 



Tn ■■= 

U-l 

uo 

Un-2 


(2) 


J^l — n 

U2-n ■ 

Uo 



For d > 2, let n = 

= ini,. 

..,nd) and n_i = 

in2,. 

■,nd). 


Given a d-dimensional (dD) complex sequence u = [uk], 
k G Z^, an n, d-level Toeplitz (dLT) matrix Tn is dehned 


recursively as: 



Ton-i 

T(-l)n-i 

• ■ • ^(ni — 

• ■ • ^(ni—2)n_i 

Tn ■■= 


T (i-n,)n^. 

^(2—• ■ • ^Oti_i 

where for fci = 1 — ni , 

..., ni — 1 , denotes an n 


(d — 1)LT matrix formed using [itfc], —n_i < k_i < rii. It 
can be seen from (|^ that T„ is an ni x ni block Toeplitz 
matrix in which each block is an n_i, (d — 1)LT matrix and 
thus T„ G where N = nf=i™ example, in 

the case of n = (2,2) we have that 


Uoo 

Uoi 

uio 

Mil 

Wo(-l) 

Uqo 


Mio 

^*(-1)0 


^00 

MOI 


n{-i)o 


Moo. 


Note that a 2LT matrix is also called Toeplitz-block-Toeplitz 
or doubly Toeplitz in the literature. For notational simplicity, 
we will omit the index n in T„ when it is obvious from the 
context. 

B. Vandermonde Decomposition and Frequency Estimation 

The Vandermonde decomposition of Toeplitz matrices is the 
basis of subspace methods for ID frequency estimation. To be 
specific, let a„ (/) = n-s [l, ..., S C” 

denote a uniformly sampled complex sinusoid with frequency 
/ S T and unit power, where i = s/—l. It follows that 


for / e r. An if) ■- [an ifi ),..., a„ ^/r)] e is a 

Vandermonde matrix (up to a factor of n~ s). Let us consider 
the parametric model for frequency estimation 

r 

y = Anif)c = Y^ Cjan ifj) , (5) 

where cj = |cj|e®'^’ S C are complex amplitudes, are 
initial phases, and y G C" denotes the sampled data. Assume 
that the sinusoids have i.i.d. random initial phases. It follows 
that P Ecc^ = diag ^|ci|^ ,..., . Then, the data 

covariance matrix 

r 

R = Eyy^ = (/) PA^ (/) = ^ \c/ an (/,) a^ (/,) 

i=i 

( 6 ) 

is a rank-r PSD Toeplitz matrix (assuming that r < n and fj, 
j = 1,... ,r are distinct). The sequence u used to generate 
the Toeplitz matrix R is given by 

r 

Uk = '^\cjf l — n<k<n—l. (7) 

The Vandermonde decomposition states that the converse is 
also true. That is, any PSD Toeplitz matrix R of rank r < n 
can always be uniquely decomposed as in Consequently, 
the frequencies can be unambiguously retrieved from the data 
covariance [note that in the presence of white noise, the noise 
contribution to R can also be identified (see |[^)]. In practice, 
R can only be approximately estimated and subspace methods 
like MUSIC and ESPRIT have been proposed to carry out the 
frequency estimation task. 

In the MD case, let / G denote a set of r, dD 

frequencies f.j G T"^, j = l,...,r, where f.j can be 
understood as the jth column of f. Let fi denote the /th 
row of / or the set of frequencies for the Zth dimension, and 
fij be the ((,j)th entry. A uniformly sampled dD complex 
sinusoid with frequency f .^ and unit power can be represented 
by an (f.j) := an, (fij) O • ■ ■ ® (fdj) G C^, where 
® denotes the Kronecker product and the index n implies 
that the sample size is ni along the (th dimension. It follows 
that An if) ■■= [an if a) ,---,an if.,,-)] = An, (/i) * • • • * 
And ifd) ^ where * denotes the Khatri-Rao product 

(or column-wise Kronecker product). Due to the fact that 
An if) can be written as the Khatri-Rao product of d Vander¬ 
monde matrices, we still call An if) a Vandermonde matrix. 
In the problem of MD frequency estimation, the sampled data 
y follows a similar parametric model: 

r 

y^Anif)c = '^Cjan{f.,j). (8) 

i=i 

Under the same assumption on the initial phases, the data 
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covariance matrix 

R = (/) PA^ (/) 

= ar^ (/:j)a^ (/:j) 

3 = 1 (9) 

r d 

i=i i=i 

turns out to be a PSD n, c?LT matrix of rank no greater than 
r and is generated by the sequence 

r 

Uk = ''^\cjf —n<k<n. (10) 

3 = 1 


the main part of the proof in ( [52^ is nothing but the first 
step of ours that follows from H31^ and holds for general 
block Toeplitz matrices (see below). Moreover, Eq. (44) in 
& which provides a Vandermonde decomposition ofT and 
concludes Proposition 2 in (HI/, does not hold true. To see 
that, consider the case where {f 2 j}i^-i have identical entries. 
Then, the {f 2 jYj^i constructed in [3^ Eq. (40)] are not 


unique (note that a typo exists in [32~Eq. (40)] where fi 


should be f 2 i). It follows that the Vandermonde decomposition 
constructed in I132\ Eq. (44)] is not unique either, which cannot 
be true according to Theorem 00/ this paper. 

To prove Theorem [T] we first consider the uniqueness part 
which essentially follows from the following lemma. 


The fundamental question as to whether also the converse 
holds true has remained unresolved, though numerical tools 
such as extensions of MUSIC and ESPRIT have been de¬ 
veloped for 2D frequency estimation from a data covariance 
estimate. 

III. Vandermonde Decomposition oe MET Matrices 
A. Generalizing the Vandermonde Decomposition 

A main contribution of this paper is summarized in the 
following theorem which generalizes the Caratheodory-Eejer’s 
result from the ID to the MD case (note that we will omit the 
index n in T„, a„ and for simplicity). 

Theorem 1. Assume that T is a PSD dLT matrix with d > 1 
and rank (T) = r < min^ rij. Then, T can be decomposed as 

r 

T = A{f) PA (/) = , (11) 

3 = 1 

where P = diag (pi,... ,pr) with pj > 0, f .j, j = 1,..., r 
are distinct points in T/ and the (d + l)-tuples (^f.j,pj^, 
j = 1,... ,r are unique. 

Remark 1. The sufficient condition that rank (T) = r < 
min_, rij of Theorem is tight. Indeed, for r > min^ rij 
we can always find T matrices that admit infinitely many 
Vandermonde decompositions of order r. Consider the case 
of r = minj rij as an example. For d = 1, first note that T is 
invertible. For any fi S T, let pi = (fi)T^^ai (fi)] 
Then it is easy to see that T — piai (/i) (/i) is a PSD 

Toeplitz matrix of rank r — 1 and thus it has a unique 
Vandermonde decomposition of order r — 1. This means that 
we have found a Vandermonde decomposition of order r 
for T. Since fi above can be chosen arbitrarily, there exist 
infinitely many such decompositions. For d > 2, without loss 
of generality, assume that ni = r and n 2 ,...,nd > r. 
Then for any PSD Toeplitz matrices G w/f/j 

rank{Tnf) = r and rank{Tni) = 1, I = 2,... ,d, the dLT 
matrix T = Tm admits infinitely many Vandermonde 

decompositions of order r because Tn^ does so (as shown 
previously). 

Remark 2. For d = 2 a similar result to Theorem [7] was 
recently presented in j [32) Proposition 2]; however, its proof is 
incomplete and certain derivations are flawed. In particular. 


Lemma 1. Assume that are d > 1 

sets of distinct points in T. Then, 

{it {flji 5 ■ ■ • ) fdjd^ • jl 1) • ■ ■ ; Ti/, I 1,..., d} (12) 
are linearly independent. 

Proof: Eor d = 1 the result is well known and its proof 
is therefore omitted. It follows that (//), Z = 1,..., d are 
all invertible. Eor d > 2, note that the vectors in ( [T2l ) form the 
N X N matrix A^ (fi). We complete the proof by the 

fact that 

C d \ d d 

(/;) I = ]^rank(A„, (/;)) = JJnz = TV. 

1=1 / i=i i=i 

(13) 


The existence of the Vandermonde decomposition is proven 
via a constructive method. The proof is motivated by the 
decomposition of block Toeplitz matrices given in pT) , the 
proof of which is also provided for completeness. 


G C 


mnxmn 


be an n X n PSD 


Lemma 2 ( |M|). Let 
block Toeplitz matrix with rank = r. Then there exist 

V = [..., rtj,... ] G and fj G T, j = 1,..., r such 

that can be decomposed as 


= {fj) af {fj) ® Vjvf 

3 = 1 
r 

= ® ® ■ 
3 = 1 


(14) 


Proof: Since is PSD with rank = r, there 

exists Y G such that = YY^. Write Y as 




Y Y 

r Q , . . . , r n-1 


with Y j G 


Y = 

Define the upper submatrix Yjj = 


the lower submatrix Y t = 


Toeplitz structure of it holds that 


rH 


, j = 0, ...,n-l. 

H 

and 

1 H 

. By the block 


VH 

To J ■ ■ ■ ! T „_2 




YuY^=YlY^. 


(15) 
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Thus there exists a unitary matrix U G such that Yl = 


YjjU (see, e.g., |38 Theorem 7.3.11]). It follows that 

H 


Y = 


YoAYoU) 


H 


YoU' 


n—l 


H 


(16) 


Let Ti, I = l — denote the matrix on the hh block 

diagonal of . So we have that 

Ti=YoU-‘Y^, Z = l-n,...,n-l. (17) 


Next, write the eigen-decomposition of the unitary matrix U, 
which is guaranteed to exist, as 

U = UZU^, (18) 

where Z = diag , Zj,...) and U is another unitary matrix. 
The eigenvalues Zj, j = 1,... ,r have magnitude of 1 and 
thus Zj = fj e T. Inserting •El into ( [T7 ] i and letting 

V = YqU, we have that 

r 

Ti = VZ-W^ = (19) 

i=i 

where Vj denotes the jth column of V. Finally, ( [l4| ) follows 
from ( [T^ . ■ 

To derive the Vandermonde decomposition based on Lemma 

1^ a key result that we will use is the following. 

Lemma 3. If a dLT matrix T, d > 1, can be written as 

T = A{f) CA^ if) , (20) 


where C G j, ^ miiij nj and f.j, j = l,...,r are 

distinct points in T'^, then C must be a diagonal matrix. 

The proof of Lemma is somewhat complicated and thus it 
is deferred to Appendix|A| Note that we first prove the result in 
the case of d = 1 using the Kronecker’s Theorem for Hankel 
matrices | |34) , | |^ and the connection between Toeplitz and 
Hankel matrices. We then complete the proof for d > 2 using 
induction. 

Proof of Theorem ^ We first show that the Vandermonde 
decomposition in ([^ I, if it exists, is unique. To do so, suppose 
that T admits another decomposition 

T = A (/') P'A^ (/') , (21) 

where, as in ([TT]), P' = diag(p'i,...,p;) with p'j. > 0, 
and f'.f., k = l,...,r are distinct points in It fol¬ 
lows that A (/') P'A^ (/') = A (/) PA^ (/) and thus 
A (/') P '2 = A{f) P^U', where [/' is an r x r unitary 
matrix [ |3^ Theorem 7.3.11]. So we have that 

A{f')=A{f)p--U'P'-K (22) 

This means that each a is a linear combination of 

{a {f.j) Yj^i' in other words, for fc = 1,..., r, the r -f 1 < 
min^ rij vectors |a {f'.Y, ^ ifi), ■ ■ ■ ,o, (f.r)} are linearly 
dependent. By Lemma [b this can be true only if f'.^. G 
{f-.jVj=V implying *at C ®y ^ similar 

argument, we can also show that {f jY._^ C {f'kYk-v 
Consequently, {f'jYj-i ^nd {f jYj-i identical. Then it 


follows that the coefficients {pj} and {p'j} are identical as 
well. 

We use induction to prove the existence part. First of all, 
for d = 1 the result turns out to be the standard Vandermonde 
decomposition of Toeplitz matrices and thus it holds true. 
Suppose that a decomposition as in o exists for d = dp — 1, 
do > 2. It suffices to prove that it also exists for d = do. We 
complete the proof in three steps. In Step 1, by viewing T as 
an TT-i X ni block Toeplitz matrix and applying Lemma we 
have that 


EK 


{hj)an, iflj) 




H 


(23) 


ifij) O (a„, (/y) O Vj) 


H 


where fij G T, j = l,...,r. The following identity that 
follows from ( [T^ will also be used later; 

T..i = VZ[V^, ( = 0,...,ni - 1, 


(24) 


where Zi = diag (e 


,i27r/i 


1 6 ' 


i2Trfi, 


)■ 


H 


In Step 2, we consider the first block of T, Tq = W 
that is a PSD (do — 1)LT matrix. Let r' = rank (To) < 
r < minj rij. By the assumption that a Vandermonde de¬ 
composition exists for d = do — 1, To admits the following 
Vandermonde decomposition; 


To — Ar 


(Y) 


PA 


H 


= Epj“^ 

i=i 


. (/- 


i.i 


. (LO 


(25) 


where /_i j is the jth column in f_^ G T(‘^“ i)xr^ f-ij’ 
j — l,...,r' are distinct, and P = diag (pi,... ,pr') 
with Pj > 0, j = l,...,r'. Because Tq — VV^ = 


A 

7.3.11] 


.(/-.) 


PA 


H 


J it holds that |38 

. (L.) 


V = A 


P"0, 


Theorem 


(26) 


where O G C’’ and OO^ = I. Inserting ( |2^ into ( |24l i, 
we have that 


T_, =A. 


(L.) 


) P^OZ[O^P^^ 


iH 


. (/-.) . 


(27) 


Z = 0,..., ni — 1. 


It immediately follows from Lemma |3| that P^ 0Z\0^P^, 
Z = 0,1,..., ni — 1 are diagonal matrices and so are OZ^O . 

In Step 3, we show that O and V are structured, which 
together with ( |23| l leads to a decomposition of T as in O- 
To do so, let 


D{1) :=0Z[0^ = Z = 0,l...,ni-1 

i=i 

(28) 

be a series of diagonal matrices, where O {j) = OjO^ and 
Oj is the jth column of O. First consider the case when fij, 
j = l,...,r are distinct. For the (m,n)th entry of D{1), 
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denoted by Dmn (1), we have the following linear system of 
equations whenever m ^ n: 


'0 


-Dmn(O) 



0 

= 

DmnY) 

e 

II 

OmnY) 

0 


Pmnijtt 3)_ 


OninY )_ 


where An-^ [fi) has full column rank since r < ni. It 
immediately follows that Omn{j) = 0, j = when 

m ^ n, i.e., O (j) are diagonal matrices. Moreover, each O (j) 
contains at most one nonzero entry on its diagonal since its 
rank is at most 1. This means that Oj has at most one nonzero 

entry and hence, Vj = An_^ (^f Oj is the product of 

a scalar and some column in As a result, we 

obtain from ( |2T| l a decomposition of T as in 

We next consider the other case when some /ij’s are 
identical. Without loss of generality, we assume that fij, 
j = 1,... ,ro < r are identical and different from the others. 
By similar arguments we can conclude that ^ 

diagonal matrix of rank at most rg. Then, 




2 = 1 


(/_i) (Eo.oj^ j (/_i) (30) 




' u 

= (/ -Ij) CLn^x (/ ~l,j) ! 

2 = 1 

ere pj > 0 and ^ G ■ ■ •,} 

.., ro- Inserting ( [3Q] t into ( [2^ , we have that 

/ ro 

T = a„^ (/ii) (/n) 0 ^ 


for j = 


2^ 

3 




+ E “"1 ® 

j^ro + 1 

2 = 1 


(31) 


\H 


+ E/ i^ni iflj) ® Vj) (dm (/l2)®^2)' 

j=rQ + l 

Therefore, by similarly dealing with fij, j = rg +1,..., r we 
can obtain a decomposition of T as in O- ■ 


B. Finding the Vandermonde Decomposition 

The proof of Theorem provides a constructive method 
for finding the Vandermonde decomposition of dLT matrices. 
In the case of d = 1, the result becomes the conventional 
Vandermonde decomposition which can be computed using 
the algorithm in Q or subspace methods (or the matrix pencil 
method introduced later). For d > 2, by viewing T as an 
ni X ni block Toeplitz matrix, we can first obtain a decomposi¬ 
tion as in ( |2^ following the proof of Lemma|^ Then it suffices 


to find the Vandermonde decomposition of the (d — 1)LT 
matrix Tg as in ( |25| l. The pairing between fij, j = 1,... ,r 
and f-ij, j = = rank(Tg) can be automatically 

accomplished via ( |26] ) in which O = P ^ AI^_^ (•^~^) 
where 3^ denotes the matrix pseudo-inverse. As a result, the 
Vandermonde decomposition can be computed in a sequential 
manner, from the IL to the 2L and finally to the dL case. 

We next study in detail the computations of fij, j = 
l,...,r and V in ( |2^ . Recalling the proof of Lemma 
we will use the identities T = YY^, Y^ = YjjU, U = 
UZiU and V = YqU, where Zi = diag (^n,..., zi^) 
with zij = j = l,...,r. We consider the matrix 

pencil ^Y^YljY^Yu'J. It holds that 

Y^Yl-\Y^Yu = Y^YuIJZiU''-XY^Y u 

= Y§YuUiZi-XI)U 

and thus 

(y^Yl-zi,Y^Yu)u,=0, j = l,...,r, (33) 

where Uj denotes the jth column of U. This means that zij 
and Uj, j = 1,... ,r are the eigenvalues and eigenvectors of 
the matrix pencil (y^Yl,Y^Yij^ whose computation is a 
generalized eigenproblem. Following this observation, the al¬ 
gorithm described above for the Vandermonde decomposition 
is designated as the matrix pencil and (auto-)pairing (MaPP) 
method. Since the main computational step of MaPP is the 
factorization T — YY^, MaPP requires O (^N'^r) flops. 


C. The Case of r > min^ rij 

As mentioned in Remark the condition that rank(T) = 
r < minj rij of Theorem [T] is tight. Even so, it is of interest 
to study the existence of a Vandermonde decomposition and 
to And it in the case of r > min^ nj. A numerical algorithm 
to do this is provided in this subsection. The algorithm is 
motivated by the following observations that hold true when 
a decomposition indeed exists; 


1) Viewing T as an ni xni block Toeplitz matrix, {fijYj^i 
can be computed using the matrix pencil method de¬ 
scribed previously. 

2) Let Pi, I = 2,... be permutation matrices that are 
such that 


'Pi (R) ifL) = an, (//) 


' (/m) 


\m—l 


im^l 


for any f G Then, 


2-(i) ^ ViTPf 


remains a dLT matrix by exchanging the roles of the 
first and the Zth dimension of the dD frequencies in the 
Vandermonde decomposition. It follows that {fijYj^i, 
I = 2,... ,d can be computed up to re-sorting similarly 
to {fijYj^i- 
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3) Let T = be a truncated eigen- 

decomposition with (Ti > • • • > CTr > 0. For any 
j = in the Vandermonde decomposition, 

EL=1 ||«ma(/:j)||2 = 1 is the maximum value of the 
function g (/') = YL=i (/') II 2 ’ f' ^ 

The algorithm is implemented as follows. We hrst compute 
{fijVj=v ^ ~ 1, ■ ■ ■ ,d separately from I = I,... ,d 

using the matrix pencil method as in the last subsection, where 
j'W „ j’ Since the fij in I = is not 

necessarily the fij in the correct d-tuple f .j G T , we then 
form the correct d-tuples f.^, j = such that the 

•^2 

equation X]m=i (/ j) II 2 = 1 is satished. Finally, the 

coefficients pj, j = 1,..., r are obtained using a least-squares 
method, and a Vandermonde decomposition of T is found if 
the residual is zero. The algorithm is still called matrix pencil 
and pairing (MaPP). Note that when the matrix rank is lower 
than the dimension of each Toeplitz block, MaPP is guaranteed 
to hnd the unique Vandermonde decomposition in which the 
pairing is done automatically. When the matrix rank is higher, 
the pairing is done using a search method and MaPP is not 
guaranteed to hnd a decomposition. In the latter case, MaPP 
requires O [N"^r + Hops. Note that MaPP is similar 

to the matrix pencil method in |j^ for 2D frequency estimation 
from a Hankel matrix formed using the data samples or from 
a data covariance estimate. 

For Z = 1,..., d, let e; G {0, be the vector with one at 
the /th entry and zeros elsewhere. This means that ei,..., 
form the canonical basis for A theoretical guarantee for 
MaPP is provided in the following theorem. 

Theorem 2. Assume that rank(Tn-e^) = ■■■ = 

rank{Tn-f,d) = rank{T) = r. Then MaPP is guaranteed to 
find an order-r Vandermonde decomposition of T as in (HB 
if it exists. 


- (0 

T as a principal submatrix of the ni x ni block Toeplitz 
matrix in ( [35] l obtained by removing the blocks in the 
last row and column. It follows from previous arguments that 
if rank = r, then can be uniquely found by 

W ( 1 ) 

MaPP. In fact, this can be readily shown by the fact that T 
is identical to Tn-ei up to permutations of rows and columns. 


Remark 3. It is easy to check that the assumption of Theorem 
2 is satisfied under the condition that r < minj rij of Theorem 
1 Moreover, in the ID case the assumption of both Theorem 
1 and Theorem turns out to be N = ni > r, as expected. 


The following result suggests that the assumption of Theo¬ 
rem]^ is weak and therefore MaPP can be expected to work 
for many MLT matrices. 


Proposition 1. Assume that 


r < N — 


N 

min; m 


(37) 


and that T is given by o in which pj > 0, j = 1,..., r 
and the dr frequencies fij, I = 1,..., d, j = 1,..., r are 
drawn from a distribution that is continuous with respect to the 
Lebesgue measure in Then, the assumption of Theorem 
1^ is satisfied almost surely. 


Proof: Since T — A (/) PA^ (/), it is 
that {f)PA^_^^ if). By Eo 


4], under the assumptions of Proposition [T] it 
surely that rank(A„_ei (/)) = ••• = rank (A 
rank(A(/)) = r since r < min; 


easy to see 
Proposition 
lolds almost 

-e. if)) = 
= N - 


N 


. Then the stated result follows by making use of 
the fact that rank(T) = rank(A(/)) and rank(T„_ei) = 
rank(A„_e, (/)), Z = l,...,d. ■ 


Proof: Suppose that T admits an order-r Vandermonde 
decomposition as in O- It suffices to show that the sets of 
frequencies {fijYj^i, Z = 1,..., d are unique and that MaPP 
can Hnd them. We Hrst consider the computation of fij, j = 
1,..., r using MaPP. Note that Tn-ei is a principal submatrix 
of the Hi X Til block Toeplitz matrix T obtained by removing 
the blocks in the last row and column. Following the proof 
of Lemma suppose that T = YY^, Y G The 

assumption that rank(rri_ei) = rank(r) = r implies that 
Y u has full column rank. It follows that the unitary matrix 
U is unique given Y. Moreover, the matrix pencil method is 
able to Hnd fij, j = 1,... ,r. 

We next show the uniqueness of fij, j = 1,... ,r. Suppose 
that T — Y'Y'^, Y' G Then there must exist a unitary 

matrix U' such that Y' = YU'. It follows that the new matrix 
pencil 

Y'YY'^ - XY'YY'u = U'^ (y^Y^ - XY^Yu) U' 

(36) 

has the same eigenvalues as Y^Yj^ — XY^Yjj, which proves 
the uniqueness. 

We now consider the case of I > 2. Similar to Tn-e^, 
we deHne the (n; — 1, ni,..., n;_i, n;+i,..., rid), dLT matrix 


IV. Application to MD Super-Resolution 
A. MD Super-Resolution via Atomic Iq Norm 

The concept of super-resolution introduced in m refers 
to the recovery of a (ID or MD) frequency spectrum from 
coarse scale time-domain samples by exploiting signal spar¬ 
sity. It circumvents the grid mismatch issue of several recent 
compressed sensing methods by treating the frequencies as 
continuous (as opposed to quantized) variables. In this paper 
we tackle the problem using an atomic £0 (pseudo-)norm 
method instead of the existing atomic norm method. The 
reasons are three-fold. Firstly, the atomic £q norm exploits the 
sparsity to the greatest extent possible, while the atomic norm 
is only a convex relaxation. Secondly, the study of atomic 
norm methods suffers from a resolution limit condition which 
is not encountered in the analysis of the atomic £0 norm 
approach (see the ID case analysis in p2) , |[23)). Lastly, 
we show that a Hnite-dimensional formulation exists that can 
exactly characterize the atomic £0 norm, whereas parameter 
tuning remains a challenging task for the atomic norm (see 

IH). 

Consider the parametric model in (|^. We are interested 
in recovering f G given a set of linear measurements 
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of y G . Without loss of generality, we assume that the 
measurements are given by jz = Ly € where L denotes 
a linear operator. Then all possible vectors y form a convex 
set C := {y : 2 : = Ly}. The atomic £0 norm of y is dehned 
as 


1 . 4,0 •“ 


inf 


K 


K 

i=i 


a{f,. 


(38) 


We propose the following approach for signal and frequency 
recovery by exploiting signal sparsity: 


min ||y|l^ 0 , subject to y G C. (39) 

V ’ 

Therefore, we estimate y using the sparsest candidate y*, 
which has an atomic decomposition of the minimum order, and 
in this process we obtain estimates of , j = 1,..., r using 
the frequencies in the atomic decomposition of y*. Regarding 
theoretical guarantees for the atomic £0 norm minimization 
method, we have the following result. 

Proposition 2. Given z = Ly where y is defined in ([^, the 
atomic decomposition of the optimizer of exactly recovers 
the frequencies f .j, j = 1,.. ., r if and only if they can be 
uniquely identified from z. 


Proof: We hrst show the ‘if’ part using a proof by 
contradiction. To do so, suppose that the atomic decomposition 
of the optimizer of ( [39l l cannot recover the frequencies f .j, 
j = 1,... ,r. This means that there exist f'.j, j = 1,... ,r' < 
r and y' G C satisfying that y' = {f'j)- follows 

that 2 ; = Ly' = Cji'O (/-j)- Hence, the frequencies 

cannot be uniquely identified from 2 :, which contradicts the 
condition of Proposition 

Using similar arguments we can show that, if the frequen¬ 
cies cannot be uniquely identihed from 2 ;, then they cannot 
be recovered from the optimizer of ( |39l ). This means that the 
‘only if’ part also holds true. ■ 

Proposition [^establishes a link between the performance of 
frequency recovery using the atomic £0 norm minimization and 
the parameter identihability problem that has been well studied 
in the full data case where L is an identity matrix, see 

gg. It is well known that identihability is a prerequisite 
for recovery. As a result, the atomic £q norm minimization 
provides the strongest theoretical guarantee possible. 


B. Precise Formulation of the Atomic £q Norm 

For the purpose of computation, a hnite-dimensional for¬ 
mulation of the atomic £0 norm is required. To do that, we 
assume that ||y IU,o = r <r, where r is known. In fact, we 
can always let r = iV -f 1 since y can always be written as 
a linear combination of N sinusoids with different frequen¬ 
cies; of course a tighter bound leads to lower computational 
complexity (see below). 

For any G and > ni, I = note that 

a [f.j] = an {f,j) is a subvector of a' (f.^) := an' (f.j). 
Let LI be the index set which is such that a ) = (f -.j)- 
Similarly, T — Tn is a submatrix of T' := T„/. We have the 
following result. 


Theorem 3. Assume that ||y||_ 4 Q = r < f. Let n'l > 
max(n;, r), I = Then, |ly||_ 4 Q equals the optimal 

value of the optimization problem 


min rank (T') , 

t.T'.y' '■ ' 


subject to 


iH' 


y 

T' 


> 0 , y'n = y, 


(40) 


where the objective function rank {T''} can be replaced by 


rank 


y 


yfH- 

T' 


Proof: Using the fact that |jy|l^ p = r, we have that y ad¬ 
mits an order-r atomic decomposition asy — (f j)- 

Then, we can construct a feasible solution as 


if T', y') 


El 

i=i 


= I r, > |c,ra' [f.j) a'^ [f.j) 


E 

i=i 




(41) 


since y'^ = (f -i) = E=i ^ 


y 


3 

y>H 

T' 


= E 

i=i 


r r 

E 

i=i 


1 

cja'^ 

{f:,) 

ICjfa' (/:j 

1 

1 

o’ 



(/.) 




(/.)J 


(42) 


> 0 . 

It follows that r* < rank (T') = r = ||y||_ 4 Q, where r* 
denotes the optimal solution of the problem in (|4g. On 
the other hand, when ( |40| achieves the optimal value r* 
at the optimizer (t*,T'*,y'*), we have that rank (T'*) = 
r* < r < min; n'l. It follows by Theorem that T'* 
admits a unique Vandermonde decomposition as T'* = 

if*j) Therefore, there exists s* such 

that y'* = {f*j) since y'* lies in the range space 

of T'*. It follows that y = y^ = hence 

\\y\\A,o - conclude that |ly||^ p = r*. 

When the objective function rank {T'} is replaced by 
' t y'^1 \ 

, the stated result follows by a similar 


rank 


rank 


y 

,y T'^ 

argument. In fact, in the hrst part of the proof, using 
the same constructed feasible solution we have that r* < 

2/11^0- Then, in the second part 

[t\T'\y'-' 

y'""^ 


y 


y 

T' 




of the proof, at the optimizer 

y 


rank (T'*) < rank 


same arguments can then be invoked to show that ||y||^Q ^ 


we have that 
= r* < r < min; nj . The 
< 


By ( |40l l, the problem in ( |39| l can be written as the following 
rank minimization problem: 

..’HI 


min rank (T'} , subject to 


y 

T' 


> 0, y^ G C. 


( 43 ) 
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Once ( |4^ is solved, the estimate of y is obtained as the 
frequencies can be retrieved from the Vandermonde decompo¬ 
sition of T' which can be computed using the MaPP algorithm 
proposed in Section III-B[ and an atomic decomposition of y 
follows naturally. 

Remark 4. In practice, due to the possible absence of a tight 
upper bound r and/or computational considerations, we may 
also be interested in the following problem: 

y T 


min rank (T), subject to 


> 0 , 


(44) 


which corresponds to setting n[ = ni, I = 1,..., d in ( |40| l. Let 
{t *, T*) be the optimal solution of with r* = rank (T*). 
Then, by the proof of Theorem a © precisely characterizes 
||y||_^g if T* admits a Vandermonde decomposition of order 
r*, which is guaranteed by Theorem c* < min; n;. 
Otherwise, the existence of the Vandermonde decomposition of 
T* can be checked using the MaPP proposed in Section III-C 


This provides a checking mechanism for determining whether 
the dimension-reduced problem in ([g achieves \\y\\_^ q. At the 
same time, it also provides an approach to frequency retrieval 
from the solution of 

C. Solution via Convex Relaxation 

We have shown in Theorem that the atomic fg norm 
minimization is a rank minimization problem. For this type 
of problem the nuclear norm relaxation has been proven to be 
a practical and powerful tool 0^ ED- So we consider the 
following nuclear norm/trace minimization problem [by relax¬ 


ing rank 




min f-l-tr(T') 

t.T'.y' ' ' 


subject to 


yiH- 

T' 


> 0 , y'n = y- 


(45) 


I = 


It turns out that ED with appropriate choices of 

is nothing but (the dual problem of) the SDP for¬ 
mulation (up to a factor equal to 5 ) of the atomic norm (24| 
defined as 


\y\\A ■= 


c,G' 


inf 


j 



(46) 

The atomic norm was shown in p2) to successfully recover 
the frequencies if they are sufficiently separated. It is easy 
to show that the optimal objective value of ( |45] l provides a 
lower bound for the atomic norm. However, the problem of 
choosing n[, I = I,... ,d such that ( |45| l is guaranteed to 
characterize ||y||^ is open. The paper 


Vandermonde decomposition (see also 0 ). Therefore, simi¬ 
larly to what was said in Remark]^ a checking mechanism can 
be implemented by verifying whether rank (T'* ) < min/ n[ 
holds or otherwise finding a Vandermonde decomposition of 
T'* using MaPP. Compared to the dual polynomial method 
in p4) , this method requires less computations and is more 
practical. 

D. Solution via Reweighted Minimization 
Let 


t 


y/H 

T' 


M. {y') := minrank (T') , subject to 

be a metric of y'. Then, ( |4^ can be rewritten as 
mmAA [y '), subject to y'^ G C. 


To approximately solve ( |48] l [or ( |4^ , ([39ll], inspired by 
we propose a smooth surrogate for A4 {y')'. 

■44'^ (y') =minf -f In \T' -f el\ , 

t T' 


> 0 (47) 


(48) 


subject to 


yfH 

T' 


> 0 , 


(49) 


where e > 0 is a regularization parameter. In ( |49] l. In |T' -f e/| 
is a smooth surrogate for rank (T') in ( |47| l and the additional 
term t is included in the objective function to control the 
magnitude of T' and avoid a trivial solution. Similarly to 1231 
we have the following result. 


Proposition 3. Let e 

true: 


0. Then, the following statements hold 


\) If M. [y') < N' := Y\.i=i^'i’ 


AL (y) -- {M (y') - N') In 


(50) 


i.e., limg_>o 


M‘(v) 


-TT = 1 ; otherwise, [y') is 


it involves a dD search over all frequencies at which the so- 
called dual polynomial achieves the maximum magnitude, and 
is hard to implement. 

Using the Vandermonde decomposition of MLT matrices we 
can provide another checking mechanism as follows. It can be 
shown that (|45]l achieves ||y||^ if the solution T'* admits a 


a constant; 

2) Let T'f be the optimizer ofT' in Then, the smallest 
TV' — Af (y') eigenvalues of T'f are either zero or 
approach zero at least as fast as e, and any cluster point 
of T'f at e = 0 has rank equal to A4 (y'). 

Proof: See Appendix [B| ■ 

Next, we consider the minimization of Al*^ (y'). Therefore, 
rather than directly solving ED- we replace the objective 
function in ( |4^ by that in ED- Following the developments in 
| [ 2 D |, a locally convergent iterative algorithm can be derived 
in which the jth iterate of T', denoted by T' , is obtained 
as the optimizer of the following weighted trace minimization 
problem: 


(51) 


p4l provides the only 


r 1 

e solution of ED- 

min t -f tr 

T' 


subject to the constraints in ED- 


Here {e, > 0 : j > 1 } is a monotonically decreasing se¬ 
quence. The resulting algorithm is designated as reweighted 
trace minimization (RWTM). Let Tg = 0 and eg = 1. Then 
the first iteration is exactly the convex relaxation method 
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Fig. 1. The maximum, median and minimum errors of frequency retrieval in 
the Vandermonde decomposition result using MaPP. 1000 Monte Cai'lo runs 
are carried out for each r. 


introduced in Section IV-C (see (|45]l). The iterative reweighted 
process has the potential of enhancing sparsity and resolu¬ 
tion (see the ID case results in ||23)). Note that a practical 
implementation of RWTM will trade off performance for 
computation time by keeping the number of iterations small. 


V. Numerical Simulations 
A. Vandermonde Decomposition 

We numerically study the performance of the proposed 
MaPP algorithm for finding the Vandermonde decomposition. 
In particular, we consider a 2D case with n = (6, 8) and 
let r = 1,... ,N = 48. In each problem instance, 2r 
frequencies are unifonnly generated at random in T and from 
them we form r 2D frequencies f.j, j = l,...,r. The 
power parameters pj, j = 1,..., r are generated as + 0.5, 
where w has a standard normal distribution. Then the 2LT 
matrix T is obtained as in 0- After that, MaPP is used 
to find a Vandermonde decomposition of T of order r. The 
error of frequency retrieval is measured as the maximum 
absolute error of the 2r frequencies. For each r, 1000 Monte 
Carlo runs are carried out. Note that MaPP only works when 

r < N -—V = 40 since otherwise the matrix pencil 

will have eigenvalues equal to infinity. The simulation result 
for r < 40 is presented in Fig. [T] Consistently with Theorem 
1^ and Proposition [T] it can be seen that for r < 40 MaPP 
can always retrieve the frequencies and find the Vandermonde 
decomposition within numerical precision. We observed that 
a relatively large numerical eiTor occurs in the presence of 
closely spaced frequencies. Moreover, the numerical errors 
propagate quickly as r gets close to 40. 


B. Super-Resolution 

RWTM is implemented in Matlab and the involved SDPs 
are solved using SDPT3 |j^. We set Tq = 0 and eg = 1, 
and so the first iteration of RWTM coincides with the convex 


relaxation method in Section |IV-C| We set ei to be equal to 
0.1 times the largest eigenvalue of ^nd let 


e, = 


1 , 
es, 


j = 2,... ,8; 

J>8. 


(52) 


RWTM is terminated if the ^2 norm of the relative change of 
y' at two consecutive iterations is lower than 10“® or the max¬ 
imum number of iterations, set to 20, is reached. Given the T' 
obtained with RWTM, MaPP is used to retrieve the frequency 
estimates by finding its Vandermonde decomposition. 

We first present an illustrative example to demonstrate the 
effectiveness of the proposed RWTM and frequency retrieval 
methods. The true values of eight (r = 8) 2D frequencies are 


plotted in Fig. 2(a) Two of them share a common frequency 


value in the first dimension; two share a common value in the 
second dimension; and another two are closely located with a 
Euclidean distance of about 0.045 (as indicated by the black 
dashed lines or circle). A number of 50 randomly located 
noiseless measurements are collected from N = 10 x 10 
uniform samples, with the complex amplitudes of the 2D 
sinusoids being randomly generated from a standard com¬ 
plex normal distribution. Note that the two closely located 
frequencies are separated by only which is much smaller 
than the resolution limit condition in fT^ , for the atomic 
norm method. Assume we know that r < r = 12 and so let 
n[ = n '2 = 12. We use RWTM and MaPP to estimate the 
sinusoidal signal and the frequencies. 

The RWTM algorithm ends in four iterations. The simula¬ 
tion results are presented in Fig. It can be seen from Fig. 
2(b) that RWTM gradually reduces the rank of T' and finally 
produces a solution of rank 8. From this low-rank solution, 
the proposed MaPP algorithm successfully retrieves the true 


frequencies as shown in Fig. 2(a) The estimation errors of the 
frequencies (in £2 norm) are on the order of 10“^^. It takes 
between 9.5s and 14.8s to run one iteration of RWTM. 

It is worth noting that the convex relaxation method, des¬ 
ignated by ConvRelax, at the first iteration does not produce 
a sufficiently low-rank T'. Consequently, it cannot correctly 


recover the frequencies and the signal. Fig. 2(c) plots the 


magnitude of the dual polynomial whose peaks of magnitude 
1 are used to identify the frequency poles. Note that in this 
example many frequency poles are present and a continuous 
band with magnitude greater than 0.9999 is present around 
the two closely located true frequencies. It is a challenging 
task to accurately locate all these peaks using a 2D search, 
and therefore it is difficult for the checking mechanism in 
| l24l to determine whether ConvRelax realizes the atomic norm 
method. 

Next, we turn to the new checking mechanism proposed 


in Section IV-C based on finding a Vandermonde decom¬ 
position of T'. In this example, we count the eigenvalues 
of T' above a threshold of 10“®, which gives the number 
44 that is used as the matrix rank in MaPP. After that, the 
proposed MaPP algorithm is used to find a Vandermonde 
decomposition of order 44 with a relative eiTor, measured by 

can conclude that ConvRelax indeed realizes the atomic norm 
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Fig. 2. Super-resolution results in an illustrative example, (a) Ground truth and the frequencies recovered by RWTM + MaPP. (b) Eigenvalues of the iterates 
of T' . (c) Contour plot of the magnitude of the dual polynomial of ConvRelax. (d) The frequencies recovered by ConvRelax -H MaPP (at the first iteration). 


method. The frequencies in the Vandermonde decomposition 
are presented in Fig. |2(d) which well match the peaks of the 
dual polynomial shown in Fig. |2(c) 


In the following simulation we study the sparse recovery 
capabilities of RWTM and ConvRelax in terms of sparsity- 
separation phase transition that was first introduced in p3j . We 
consider a 2D case with n = (8,8). In each Monte Carlo run, 
a number of 32 randomly located noiseless measurements are 
collected from the = 64 uniform samples, with the complex 
amplitudes of the 2D sinusoids being randomly generated 
from a standard complex normal distribution. The number 
of sinusoids r that we consider ranges from 1 to 16. Any 
two 2D frequencies are separated (in norm following fl^ , 

g ) by at least A/ which takes on the values 0, ^,... 

randomly generate a set of 2D frequencies, a new 2D 
frequency is randomly generated at one time and added to 
the set if the separation condition is satisfied and the process 
is repeated until r frequencies are obtained. For each pair 
(A/,r), 20 data instances are generated and the signal and 


its frequencies are estimated using RWTM and ConvRelax. 
For speed consideration we set n' = n in both RWTM and 
ConvRelax. Successful recovery is declared if the relative root 
mean squared error (RMSE) of signal recovery is less than 
10“® and the error of frequency recovery (in £oo norm) is less 
than 10“® (our experience suggests that these two conditions 
are satisfied or violated jointly). 

The simulation results are presented in Fig. In particular. 
Fig. 3(a) and Fig. |3(b) plot, respectively, the success rates of 
ConvRelax and RWTM, where white means complete success 
and black means complete failure. Both methods can recover 
the signal and the frequencies in the regime of few sinusoids 
and large frequency separation, leading to phase transitions 
in the sparsity-separation plane. The phase transitions are not 
sharp, as in | |23) , since the frequencies are separated by at 
least Af and well separated frequencies can still be obtained 
for small values of Af. RWTM clearly has an larger success 
region compared to ConvRelax. To illustrate this more clearly. 


we plot in Fig. 3(c) the percentage of the total number of cases 
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Fig. 3. Super-resolution results, (a) Success rates of ConvRelax. (b) Success rates of RWTM. (c) The percentage of the total number of cases in which 
RWTM succeeds but ConvRelax fails, (d) Success rates of generating the set of 2D frequencies. Here white means complete success while black means 
complete failure. 


in which RWTM succeeds but ConvRelax fails. It can be seen 
that a significant number of the generated problems are of such 
a type, and that they are concentrated in the regime of median 
sparsity and/or small frequency separation. So, compared to 
ConvRelax, RWTM has improved sparsity recovery capability 
and enhanced resolution. Finally, it should be noted that not 
all problem instances are readily generated. To be specific, the 
frequency set is hard to generate in a reasonable amount of 
time in the regime of large r and large A/ (see Fig. 3(d) i. 
ConvRelax and RWTM have low success rates in this regime 
partly due to this reason. 


Finally, we consider the noisy case and compare the pro¬ 
posed super-resolution methods with a conventional subspace 
method. We select the weighted improved multidimensional 
folding (WIMDF) algorithm in O) as a benchmark. Note that 


the WIMDF algorithm (as in fact other subspace methods) 
does not work in the compressive data case and thus it is 
only considered in the full data case. In contrast to this, the 
proposed RWTM and ConvRelax methods are also considered 
in a compressive data case in which 80% of the data samples 
(randomly selected) are used for frequency estimation. The 
proposed methods are implemented as in the noiseless case 
but with the feasible set C in ( |39l l re-defined as 

C = |y : II 2 : - Ly\\l < , (53) 

where 2 ; G denotes the vector of noisy measurements, L G 
{0,1}^^^ is the selection matrix representing the sampling 
scheme, M is the sample size, and rf^ is an upper bound on the 
noise energy. Note that L is the identity matrix and M = N m 
the full data case. We also note that RWTM will be terminated 
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(a) (b) 

Fig. 4. MSE results of 2D frequency estimation in the noisy case, (a) The full data case in which n X n uniform samples are acquired, (b) The compressive 
data case in which 80% of the data samples are used to estimate the frequencies. 


in five iterations. 

In this simulation, we consider a 2D case with three (r = 3) 
sinusoids that have frequencies (0.25, 0.55), (0.45, 0.55) and 
(0.45,0.35) and amplitudes ^m. 385^ ^i0.076^^ 

respectively. We let ni = n2 = n and consider different 
values of n ranging from 4 to 10. We add white complex 
Gaussian noise to the n x n uniform samples and let the 
noise variance be cr^ = ^ so that the signal to noise ratio 
(SNR) of the samples acquired following the data model in 
([^ is approximately constant (in our simulation, the averaged 
SNR for each n was between 14.6 and 15.3 dB). We set 

= (^M + 2y/M^ (i.e., mean + twice standard deviation) 

to upper bound the noise energy with high probability. This 
means that the noise variance is assumed to be known in 
the proposed methods while, somewhat similarly, WIMDF 
assumes that the number of frequencies K is known. Since 
the proposed methods might produce spurious frequencies, 
the strongest three frequency components are used as the 
frequency estimates based on which the mean squared error 
(MSE) is computed (over 100 Monte Carlo runs). Regarding 
this aspect we note that RWTM + MaPP rarely overestimates 
the number of frequencies: this happened in only 4 out of 
1400 Monte Carlo runs in our simulation (mainly due to the 
fact that the true noise energy can exceed ry^); on the other 
hand, ConvRelax + MaPP produced spurious frequencies with 
relatively weak powers in about 40% of the runs. 

The simulation results on the MSE of frequency estimation 
are presented in Eig. We hrst note that the MSE curves of 
RWTM and ConvRelax almost coincide with each other. This 
implies that in this example ConvRelax + MaPP can accu¬ 
rately localize the true frequencies by selecting the strongest 
frequency components and the main advantage of RWTM H- 
MaPP is to eliminate the spurious frequencies that ConvRelax 
+ MaPP produces. In the left subhgure, the proposed methods 
are compared with WIMDE as well as the Cramer-Rao lower 


bound (CRLB), which is computed following JTT] , in the full 
data case. It can be seen that in all the scenarios considered 
RWTM + MaPP is either comparable with or better than 
WIMDE. Eor n — 6 and n = 7, compared to WIMDE, the 
MSE improvement of RWTM + MaPP is more than 3 dB. The 
right subhgure shows that for the proposed super-resolution 
methods 20% data loss results in only a small degradation of 
1.3 to 3 dB for RWTM + MaPP when n > 5. For n = 4, 
the performance loss for RWTM + MaPP is larger because in 
some runs (14 out of 100) RWTM + MaPP can only detect 
two sinusoids. In contrast, WIMDF cannot work at all in the 
compressive data case. 

We next show the frequency estimates of WIMDF and 
RWTM + MaPP in Fig. (the results for ConvRelax H- 
MaPP are omitted due to the presence of several spurious 
frequencies). For n = 4, while WIMDF and RWTM + MaPP 
are comparable in terms of MSE, see Fig. it can be seen in 
Fig.|^that the three frequencies can be more clearly separated 
by RWTM + MaPP (with a small bias though). The same 
behavior can be observed for n = 5 as well. It can also be 
seen from this hgure that for RWTM + MaPP the 20% data 
loss causes only a small performance degradation for n = 5 
and n = 6. 

Since WIMDF is a subspace method and its main compu¬ 
tations include a singular value decomposition and a small- 
scale optimization problem for parameter tuning, it is very 
fast in practice and needs about 0.1s on average for a single 
run. Because RWTM + MaPP adopts a more sophisticated 
optimization method, it requires 3 to 38s on average when 
n increases from 4 to 10, while the computational time of 
ConvRelax + MaPP is about 1/5 of that required by RWTM H- 
MaPP. The increased computational cost of RWTM + MaPP 
and ConvRelax + MaPP can be justihed by the fact that, unlike 
WIMDF, they can be applied to the compressive data case. It is 
also worth noting that the proposed methods can be accelerated 
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Fig. 5. 2D frequency estimation results in the noisy case. The true frequencies are indicated using black circles while their estimates are shown as blue/red 
'+’. 1st row: n = 4; 2nd row: n = 5; 3rd row: n = 6. 1st column: WIMDF with full data; 2nd column: RWTM + MaPP with full data; 3rd column; RWTM 
+ MaPP with compressive (80%) data. 


using faster solvers for SDP, e.g., the ADMM algorithm | |4^ 
(see fl^ , pO) , for examples in the ID case). 

VI. Conclusion 

In this paper, the Vandermonde decomposition of Toeplitz 
matrices was generalized from the ID to the MD case under a 
rank condition. When this condition is not satisfied a numerical 
approach was also proposed for finding a possible decompo¬ 
sition. The new results were used to study the MD super¬ 
resolution problem and practical algorithms were proposed. 
Extensive numerical simulations were provided to validate our 
theoretical findings and demonstrate the effectiveness of the 
proposed super-resolution methods. 

The result on Vandermonde decomposition presented in this 
paper is closely related to operator theory and structured linear 
algebra. Its application to these areas would be of interest. 


When the matrix rank is high, the question on existence of 
the Vandermonde decomposition is still open, which should 
also be studied in the future. 
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Appendix 

A. Proof of Lemma 

To prove Lemma for d = 1, we will use the following 
result. 
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Lemma 4. Consider a Hankel matrix H S defined as 


H = 


hi 

h2 


hji 


If H can be written as 


^2 

h^ 


^n+1 


hn 

^n+1 

h2n-l 


(54) 


H = A{f)CA^if), (55) 


where C € r < n, and fj, j = are distinct 

points in T, then C must be a diagonal matrix. 


Proof: We make use of the Kronecker’s theorem for 
Hankel matrices (see, e.g., 1391). Let r' = rank (C) < r. 
Also let Hn-i & be the principal submatrix of 

H obtained by removing the last row and column. It follows 
that 

(56) 


As both A (/) and A„_i (/) have full column rank provided 
that r < n, it holds that rank(ff) = rank(JL„_i) = 
rank(C) = r'. By |39 Theorem 3.1] H can be factorized 
as „ 


H = AC A 


(57) 


where A G C"^’’ is a generalized Vandermonde matrix and 
C £ C*" is an invertible block diagonal matrix. Interested 
readers are referred to |j^ for the specific forms of A and 
C. We will use the following facts: 1) any nxr' generalized 
Vandermonde matrix has full column rank if n > r', and 2) 
C becomes a diagonal matrix if A is a Vandermonde matrix. 
From the equality 

ACA^ = ACA^ (58) 


it follows that A = A 


CA^ 



. This means that 


each column 5_,, j = 1,..., r' in A is a linear combination of 
the columns in A and thus the n x (r + 1) matrix [Sj, A] is 
rank-deficient. By the assumption that n > r + 1 and the 
properties of generalized Vandermonde matrices mentioned 
above, it follows that aj is a column in A and thus A is 
a Vandermonde matrix formed by r' columns in A. It also 
follows that C is a diagonal matrix. As a result, we conclude 
by ( |58] l th^ C is a diagonal matrix whose diagonal consists 
of that in C and r — r' zeros up to re-sorting. ■ 

Now we can prove Lemma in the case of d = 1 for which 
becomes 

T = A (/) CA^ if). (59) 


Let H and A be the matrices obtained by sorting the rows 
of T and A in reverse order, respectively. It is obvious that 
JT is a Hankel matrix and H = ACA^. Moreover, note that 
A = Adiag _ _ ^g*27i-(n-i)/r^^ where • denotes 

the complex conjugate operator. It follows that 

H = Adiag ..., CA^ (60) 

which is still a Hankel matrix. By Lemma we conclude that 
diag ^ g-z27i-(n-i)/,.^ C is a diagonal matrix 

and so is C. 


Now suppose that Lemma holds for d = do ~ do > 2. 
By induction it suffices to show that Lemma also holds for 
d = do. Let us view T as an rii x ni block Toeplitz matrix. 
For the (j -f 1, fc -f l)th block, 0 < j, fc < ni — 1 the following 
identity holds by ( |20| : 


T(j + l,fc + l) 

= A„,, (/_i) ZiCZr"A^_^ (/_i), 


( 61 ) 


where f denotes / after removing the first row and 

Zi = diag ^ g*27r/i,.^^ < r, 

denote the non-redundant collection of and let 

r £ {0, be the matrix which is such that f_i = f_iT. 

Then, ( |6T] i becomes 

T0- + l,fc + l) 

= A„_, (/_i) rzjczr'^r^A^,^ (/_i). 

Note that T {j + l,k + l)j_0 < j,k < ni — 1 are all n_i, 
(do — 1)LT matrices and f-ij, j = are distinct 

points in By the assumption that Lemma holds for 

d = do — 1 we have that 

D (j, k) := TZ{CZf^T^, 0 < j, A: < m - 1 (63) 


are all diagonal matrices. 

Let C{j,k) — Its (p, q)th entry satisfies the 

equation: 

Cpg (j, k) = . (64) 


We next study some properties of F. Define Sm = 
{j ■ ^mj = 1}^ nt = 1,... ,r[. According to the definition 
of r it holds that 

= (65) 


Therefore, Sm, m = are disjoint subsets of 

{I,...,?-} with Um=l‘S'm = {1,..., r}. Moreover, fip, 
p £ Sm are distinct for any m = l,...,r( since f.p 
j = 1,..., r are distinct. 

Let Tm be the mth row of F. Using ( [64l i we can write the 
(to, n)th entry of D {j, k) in (|6^ as 


Dmn (j, k) = TmC (j, k) F’ 
pGS™ qes„ 


J2Tr{jfip-kfiq) 


( 66 ) 


Note that ( |6^ holds whenever Q<j,k<ni — 1 and that 
Dmn {j, k) = 0 whenever m n. Therefore, when m n 
we have the following identity: 


A„1 (to) Cs^Sr^^ni (^) = (6”^) 

where A„j (to) = ^ to = 

all have full column rank since fip, p £ Sm are 
distinct, and Cs^r^Sn a submatrix of C with rows indexed 
by Sm and columns indexed by Sn- It immediately follows 
that 


CSrr^Sn = ^ when m n, 


( 68 ) 
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which means that C is a block diagonal matrix after properly 
sorting its rows and columns with respect to Sm, m = 
1 r' 

Next, we show that C is also a block diagonal matrix when 
its rows and columns are sorted in another way. Let "P be the 
permutation matrix satisfying that 


Va (/') = (/^) (g) 



for any /' € Then, 


2^(2) _ -pjrpT’ 


(69) 


(70) 


remains a dLT matrix by exchanging the roles of the hrst 
and the second dimension of the dD frequencies. By viewing 
as an n2 x 712 block Toeplitz matrix, we can repeat the 
analysis above. In particular, we can similarly dehne P' and 
S'^ according to the partition of {f_ 2 j}._^, where /_2j 
denotes f .j with the second element removed. Then C is a 
block diagonal matrix when its rows and columns are sorted 
based on S'^. 

Now we are ready to show that C is a diagonal matrix 
using contradiction. To do so, suppose that Cpq ^ 0 for some 
p ^ q. By ( |68| l there must exist some m such that p,q G Sm- 
It follows from ( |65| l that /_i p = f-i,q — Similarly, 

there must exist some m' such that p,q G S'm> and thus 
f_2p = f -2 q- It therefore holds that /.p = f which 
contradicts the assumption that f p j = 1,. . ., r are distinct. 


Ej G (0,1), j = 1,2,... such that Xej,M(y’) < }■ Since 
(f*, T'*) is bounded, without loss of generality, we assume 
that T'*_^ -g (fg, Tg*), as j -G + 00 . As a result, 

rank(TQ) < M{y') since Xe-^Miy') 0^ as j -G + 00 . 
On the other hand, it must hold that 


y 

T\ 


fH 


= lim 

j^+00 


y 

T 


fH- 


> 0 . 


(75) 


Therefore, (fg, Tg*) is a feasible solution to the problem in 
It follows that A4 {y') < rank(rg*), contradicting the 
fact that rank (Tg*) < M.{y') as shown previously. 

In Step 3, we prove the hrst part of the proposition. By ( |7^ 
and the inequality Xc,M(y') ^ c shown in Step 2 we have that 


N' 


(y) (Ae.j + e) 

> {N' - M {y')) Ine + Cl, 


where ci = In c is a constant independent of e. On 

the other hand, since any optimizer of the problem in ( |47] i is 
a feasible solution to the problem in (|49l), it is easy to see that 


M^{y)<{N'-M{y'))\nE + c 2 (77) 


where C 2 is also a constant. Combining ( |7^ and CZ) proves 
the hrst part. 

To prove the second part of the proposition, in Step 4, we 
rehne as 


B. Proof of Proposition 

We complete the proof in four steps. In Step 1, we show 
that the optimizer (f*, T'*) of the problem in ( |49l l is bounded 
for e e (0,1). Let T'f = be the eigen- 

decomposition of T'f, where the eigenvalues X^j, j = 


and p^j = . Then we have that 

7’e — 

Pe 


N' 


Tf — 

Pe 


(y) = H IT 


1=1 


i=i 


By the optimality of X^j, it holds that 


At.j+e 


= 0 and thus 


P^,3 = 




Xe.i -f e 




{y) > {N' — M. {y')) In e 

N' 

+ E >" 

j=M{y’)+i 


-fl 

A. 


e 


+ 1 I + Cl. 


(78) 


rank (T'*) 

In 

(71) 

N' 

< ^ \n 

j-M{y')+l 

(72) 

< C2 - Cl. 


<f,3 

e 


(79) 


(73) 


Inserting ( |73| l into ( |7T] l, we have that t* < < N' is bounded. 

Moreover, by (|7^ we also have that 


It follows that 

A..JV' < • • • < X,^M{y')+i < - 1) e. (80) 

Moreover, by ( [SO] ) any cluster point of T'f at e = 0 has rank 
no greater than Ai [y')- On the other hand, the rank is no 
less than AA {y') by the result in Step 2. This observation 
completes the proof. 
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